4 research outputs found

    Modelling Customer Behaviour with Topic Models for Retail Analytics

    Get PDF
    Topic modelling is a scalable statistical framework that can model highly dimensional grouped data while keeping explanatory power. In the domain of grocery retail analytics, topic models have not been thoroughly explored. In this thesis, I show that topic models are powerful techniques to identify customer behaviours and summarise customer transactional data, providing valuable commercial value. This thesis has two objectives. First, to identify grocery shopping patterns that describe British food consumption, taking into account regional diversity and temporal variability. Second, to provide new methodologies that address the challenges of training topic models with grocery transactional data. These objectives are fulfilled across 3 research parts. In the first part, I introduce a framework to evaluate and summarise topic models. I propose to evaluate topic models in four aspects: generalisation, interpretability, distinctiveness and credibility. In this manner, topic models should represent the grocery transactional data fairly, providing coherent, distinctive and highly reliable grocery themes. Using a user study, I discuss thresholds that guide interpretation of topic coherence and similarity. We propose a clustering methodology to identify topics of low uncertainty by fusing multiple posterior samples. In the second part, I reinterpret the segmented topic model (STM) to accommodate grocery store metadata and identify spatially driven customer behaviours. This novel application harnesses store hierarchy over transactions to learn topics that are relevant within stores due to customised product assortments. Linear Gaussian Process regression complements the analysis to account for spatial autocorrelation and to investigate topics' spatial prevalence across the United Kingdom. In the third part, I propose a variation of the STM, the Sequential STM (SeqSTM), to accommodate time sequence over transactions and to learn time-specific customer behaviours. This model is inspired by the STM and the dynamic mixture model (DMM); however, the former does not naturally account for temporal sequence and the latter does not accommodate transactions' dependency on time variables. SeqSTM is suitable for learning topics where product assortment varies with respect to time, and where transactions are exchangeable within time slices. In this thesis, I identify customer behaviours that characterise British grocery retail. For instance, topics reveal natural groups of products that are used in the preparation of specific dishes, convey diets or outdoor activities, that are characteristic of festivities, household or pet ownership, that show a preference for brands, price or quality, etc. I have observed that customer behaviours vary regionally due to product availability and/or preference for specific products. In this manner, each constitutional country of the UK, the northern and the southern regions of England and London show a preference for different products. Finally, I show that customer behaviours may respond to seasonal product availability and/or are motivated by seasonal weather. For instance, consumption of tropical fruits around summer and of high-calorie foods during cold months

    Modelling Grocery Retail Topic Distributions: Evaluation, Interpretability and Stability

    Get PDF
    Understanding the shopping motivations behind market baskets has high commercial value in the grocery retail industry. Analyzing shopping transactions demands techniques that can cope with the volume and dimensionality of grocery transactional data while keeping interpretable outcomes. Latent Dirichlet Allocation (LDA) provides a suitable framework to process grocery transactions and to discover a broad representation of customers' shopping motivations. However, summarizing the posterior distribution of an LDA model is challenging, while individual LDA draws may not be coherent and cannot capture topic uncertainty. Moreover, the evaluation of LDA models is dominated by model-fit measures which may not adequately capture the qualitative aspects such as interpretability and stability of topics. In this paper, we introduce clustering methodology that post-processes posterior LDA draws to summarise the entire posterior distribution and identify semantic modes represented as recurrent topics. Our approach is an alternative to standard label-switching techniques and provides a single posterior summary set of topics, as well as associated measures of uncertainty. Furthermore, we establish a more holistic definition for model evaluation, which assesses topic models based not only on their likelihood but also on their coherence, distinctiveness and stability. By means of a survey, we set thresholds for the interpretation of topic coherence and topic similarity in the domain of grocery retail data. We demonstrate that the selection of recurrent topics through our clustering methodology not only improves model likelihood but also outperforms the qualitative aspects of LDA such as interpretability and stability. We illustrate our methods on an example from a large UK supermarket chain.Comment: 20 pages, 9 figure

    Regional Topics in British Grocery Retail Transactions

    Get PDF
    Understanding the customer behaviours behind transactional data has high commercial value in the grocery retail industry. Customers generate millions of transactions every day, choosing and buying products to satisfy specific shopping needs. Product availability may vary geographically due to local demand and local supply, thus driving the importance of analysing transactions within their corresponding store and regional context. Topic models provide a powerful tool in the analysis of transactional data, identifying topics that display frequently-bought-together products and summarising transactions as mixtures of topics. We use the Segmented Topic Model (STM) to capture customer behaviours that are nested within stores. STM not only provides topics and transaction summaries but also topical summaries at the store level that can be used to identify regional topics. We summarised the posterior distribution of STM by post-processing multiple posterior samples and selecting semantic modes represented as recurrent topics. We use linear Gaussian process regression to model topic prevalence across British territory while accounting for spatial autocorrelation. We implement our methods on a dataset of transactional data from a major UK grocery retailer and demonstrate that shopping behaviours may vary regionally and nearby stores tend to exhibit similar regional demand

    Posterior summaries of grocery retail topic models: Evaluation, interpretability and credibility

    Get PDF
    Understanding the shopping motivations behind market baskets has significant commercial value for the grocery retail industry. The analysis of shopping transactions demands techniques that can cope with the volume and dimensionality of grocery transactional data while delivering interpretable outcomes. Latent Dirichlet allocation (LDA) allows processing grocery transactions and the discovering of customer behaviours. Interpretations of topic models typically exploit individual samples overlooking the uncertainty of single topics. Moreover, training LDA multiple times show topics with large uncertainty, that is, topics (dis)appear in some but not all posterior samples, concurring with various authors in the field. In response, we introduce a clustering methodology that post-processes posterior LDA draws to summarise topic distributions represented as recurrent topics. Our approach identifies clusters of topics that belong to different samples and provides associated measures of uncertainty for each group. Our proposed methodology allows the identification of an unconstrained number of customer behaviours presented as recurrent topics. We also establish a more holistic framework for model evaluation, which assesses topic models based not only on their predictive likelihood but also on quality aspects such as coherence and distinctiveness of single topics and credibility of a set of topics. Using the outcomes of a tailored survey, we set thresholds that aid in interpreting quality aspects in grocery retail data. We demonstrate that selecting recurrent topics not only improves predictive likelihood but also outperforms interpretability and credibility. We illustrate our methods with an example from a large British supermarket chain
    corecore